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Abstract 

Let n and k be positive integers, and let F be an alphabet of size 
n. A sequence over F of length m is a k-radius sequence if any two 
distinct elements of F occur within distance k of each other somewhere 
in the sequence. These sequences were introduced by Jaromczyk and 
Lone in 2004, in order to produce an efficient caching strategy when 
computing certain functions on large data sets such as medical images. 

Let fk{n) be the length of the shortest n-ary /c-radius sequence. 
The paper shows, using a probabilistic argument, that whenever k is 
fixed and n — t- oo 

... 1 /n' 
fk{n) 



k \2 

The paper observes that the same argument generalises to the situa- 
tion when we require the following stronger property for some integer 
t such that 2 < t < k + 1: any t distinct elements of F must simul- 
taneously occur within a distance k of each other somewhere in the 
sequence. 



1 Introduction 

Let n and k be positive integers, and let F be an alphabet of size n. A 
sequence ai, a2, . . . , ctm over F of length m is a k-radius sequence if for all 
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x,y E F there exists i,j G {1,2, ... ,m} such that Oj = x, aj = y and 
K ~ j | k. The following is an example of an 8-ary 3-radius sequence over 
the alphabet F = {0, 1, 2, 3, 4, 5, 6, 7}: 

0,1,2,3,4,5,6,7,0,1,2,4,5,6,3,7. 

We write fk{n) for the length of the shortest n-ary fc-radius sequence; so 
the example above shows that /a (8) < 16. 

The concept of a fc-radius sequence was introduced by Jaromczyk and 
Lone [TT]. They were interested in these sequences so they could design an 
efficient caching strategy to compute a function that depends on compar- 
ing pairs of a sequence of large data sets such as medical images (see the 
discussion in Section [3] below) . 

Ghosh [8] showed that 

y^^^^^|(2)+l when n is odd; 

1 (2) + n/2 when n is even. 

Jaromczyk and Lone [TT] showed that f2{n) = ^{2) + 0(n^/logn), and 
gave a construction for fc-radius sequences of the right order of magnitude 
but with a leading term that is not tight. Ghee, Ling, Tan and Zhang [5] 
provided good constructions for n-ary 2-radius sequences for small values of 
n. Blackburn and McKee p] showed how to construct asymptotically good 
/c-radius sequences for many values of k. In particular, their constructions 
show that fk{n) = ^(2) +0{ii? / \ogn) whenever k < 194, or fc + 1 is a prime, 
or 2fc + 1 is a prime. They asked whether Imin^^ fk{n) / exists and is 
equal to 1/k. The main purpose of this paper is to answer this question 
positively, by proving the following theorem. 

Theorem 1. Let k he a fixed positive integer. Then 
as n —)■ 00. 

We use probabilistic methods, our main tool being Pippenger and Spencer's 
version of the Frankl-Rodl theorem on the size of the matchings in a quasi- 
random hypergraph [HI [H] (see Section [3]) . 
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Theorem [T] is proved in the next section. In the final section of the paper 
we observe that the proof of this theorem can be generahsed (Theorem H]) to 
the situation where we are interested in subsets of t elements, rather than 
pairs of elements, from the alphabet. The final section also contains various 
comments and open problems. 

2 The proof of Theorem [1] 

We begin with an elementary lemma which establishes a lower bound on 
fk{n). Jaromczyk and Lone [TT] prove a slightly stronger version of this 
lemma in their paper. 

Lemma 2. For all positive integers n and k, 



Proof. Let Oi, 02, ctm be an n-ary /c-radius sequence. There are less than 
km pairs of the form {oj, ctj+x} where i, i + ^ G {1, 2, . . . , m} and 1 < z < k. 
The fc-radius sequence property implies that every unordered pair of alphabet 
symbols must occur at least once as a pair {aj, aj+z} for some i and 2;, and 



To provide an upper bound on fk{n), we use a well-known theorem in 
hypergraph theory. Recall that a hypergraph F is r-uniform if all its hyper- 
edges have cardinality r. The degree deg(w) of a vertex v e F is the number 
of hyperedges containing t>; the codegree codeg(f , w) of a pair of distinct ver- 
tices f , w G F is the number of hyperedges containing both v and w. A cover 
is a set of hyperedges in F whose union is equal to the set of all vertices of F. 

Theorem 3. Fix an integer r and a positive real number 6. Then there exists 
an integer hq and a positive real number 6' with the following property. 

Let F be an r-uniform hypergraph on n vertices, where n > uq. Suppose 
that all vertices ofT have degree d for some integer d. Let c = max codeg(M, v), 
where the maximum is taken over all pairs of distinct vertices u,v E T. If 
c < 6'd, then there exists a cover consisting of at most {l-\-6)n/r hyperedges. 

Theorem [3] can be proved using a second-moment technique that Alon 
and Spencer pTJ call the 'Rodl nibble' (see [TH [6] or [3, Theorem 8.4] for 
example). Also see Pippenger and Spencer [HI Theorem 1.1] for a stronger 
result. 




so km > (2). 



□ 
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Proof of TheoremUl We need to prove that lim„_j.oo /fc(^)/(2) = l/k. Now 
fk{n)/{^) > 1/^ by Lemma [21 Let e be a fixed positive real number. To 
prove the theorem, it suffices to show that for all sufficiently large integers 
n, we have that fkin)/ (2) < {1 + e)/k. 

Choose an integer £ and a positive real number 6 such that £ > k and 



l-lik + l)/£ 

Let n be an integer such that n > £, and let F be a set of cardinality n. 
Define a hypergraph as follows. The vertices of r„ are the (2) unordered 
pairs {x, y} where x,y & F. The hyperedges of r„ are the n{n — 1) ■ ■ ■ (n — 
{£ — 1)) sequences b = 61, 62, • • • , of length £ over F whose entries bi are 
all distinct. We define a vertex {x,y} to lie in a hyperedge b whenever x 
and y occur in b within a distance of k; more precisely, whenever there exist 
i,j G {1,2, . . . , £} such that bi = x, bj = y and \i — j\ < k. 

Let r be the number of ways of choosing an (unordered) pair of distinct 
positions in a sequence of length £, where the positions are at most a distance 
k apart. So 

k-l 

r = {£-k)k + ^ 

i=0 

Clearly r does not depend on n. Since the entries of b are distinct, every 
hyperedge in r„ contains exactly r vertices, and so r„ is an r-uniform hyper- 
graph. The degree d of any vertex f G r„ is equal to 2r(n — 2)(n — 3) ■ ■ ■ (n — 
{£ — 1)), which is of the order of n^~^. The codegree of distinct vertices 
v,w & Tn depends on whether v and w are intersecting when thought of as 
pairs of elements of F. But in either case codeg{v,w) = 0{n^~^) = o{d). So 
Theorem [3] implies that for all sufficiently large integers n there exists a cover 
bi, b2, . . . , bs for r„ consisting of s hyperedges, where s < (1 + 5) (2) /r. 

The definition of r„ and the fact that the sequences bj form a cover show 
that the concatenation of bi, b2, . . . , b^ is a fc-radius sequence. The length 



{£-k)k + J2^ = ^f^- \Kk + 1) 
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of this sequence is £s, and 



is<i{l + S)(^^j/r 
1 fn\ £{1 + 6) 



k\2j i-l{k + l) 
1 fn\ (1 + 6) 



k\2j l-l{k + l)/i 
1 fn^ 



^ k\2 



So fkin)/ (2) < {1 + e)/k for all sufficiently large integers n, as required. □ 



3 Comments 

We have found the leading term for fk{n) as — t- 00 with k fixed using 
probabilistic methods. It would be very interesting to search for explicit 
constructions of fc-radius sequences that are asymptotically good for any 
value of k. (The constructions of Jaromczyk and Lone pjj and of Blackburn 
and McKee [2] only lead to asymptotically good constructions for some values 
oi k.) The following problem would also be very interesting: 

Open Problem 1. Provide an upper hound (using explicit or probabilistic 
methods) of the form 

fk{n) < +g{n), 

where g{n) is a function of n that grows significantly more slowly than . 

Note added in final revision: A recent preprint of Jaromzcyk, Lone and 
Truszczynski [12] provides some beautiful recursive constructions of fc-radius 
sequences, solving Open Problem [H Indeed, they show that we may take 
g{n) = 0{n^~^'^) for any positive real number e. They also give optimal 
constructions of 2-radius sequences when n = 2p with p a prime. 

We now discuss the caching application that motivated Jaromczyk and 
Lone in a little more detail. Suppose we have a total of n medical images, 
and we wish to compute some function which depends on all pairs of these 
images. We assume that the computation involving each pair of images is 
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computationally intensive, so we wish to place these images in our cache 
before carrying out this computation. We assume our cache can hold up to 
k + 1 images at one time. Then an n-ary fc-radius sequence will enable us 
to design an efficient caching strategy, as follows. Let ai,a2, ■ ■ ■ ,am be an 
n-ary fc-radius sequence. Suppose we load image at into our cache at time 
t, using a &st-in ffist-out caching strategy. So at time t (for t > k + 1) our 
cache holds the images at-k, cit-fe+i, • • • , a*. The property of being a /c-radius 
sequence implies that any pair of alphabet symbols occurs in some window 
of length A; + 1 in the sequence, and so any pair of images simultaneously lies 
in our cache at some point. Short sequences correspond to efficient caching 
strategies for this problem. 

We might ask what the consequences are of removing our insistence on 
a ffist-in ffist-out strategy in the application above. But whatever caching 
strategy is used it is clear that at most k new pairs of images are introduced 
into our cache at every time period: the bound of Lemma [2] holds for any 
caching strategy. So the results of this paper show that imposing the restric- 
tion to a ffist-in ffist-out strategy does not affect the asymptotic efficiency. 
We should also remark that when we are not imposing the restriction to a 
ffist-in ffist-out strategy there is a simple caching method that gives asymp- 
totically tight results, which can be described as follows. We begin by loading 
the ffist batch of k images 1,2, ... ,k into the cache. Our cache can store one 
more image: keeping our initial batch of images in our cache, we load all the 
remaining images in turn. So at time k + i where 1 < i < n — k, the cache 
holds images 1,2,3, k and k + i. We then continue with the next batch 
of k images k + 1, k + 2, . . . , 2k: at time n + k + i where 1 < i < n — 2k the 
cache holds images k + 1, k + 2, . . . ,2k and 2k + i. We continue in this way, 
ffist loading a batch of k images into our cache and then using the remaining 
space to load each of the later images in turn. 

The results of this paper are easily generalised to a wider class of combi- 
natorial objects. Let k, t and n be fixed positive integers, with t < n. Let F 
be an alphabet of cardinality n. We may define a t-subset k-radius sequence 
over F to be a finite sequence ai, 02, . . . , over F such that for all t-subsets 
X C F, there exists i G {1,2, . . . ,m — k} such that 

X C {oj, Oj+i, . . . , aj+fe}. 

So a A;-radius sequence satisfies this definition in the special case when t = 2. 
Let ft,k{n) be the length of the shortest t-subset fc-radius sequence. 
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Theorem 4. Let k and t be fixed integers such that 2 < t < k + 1. Then 




as n oo. 

Proof. The lower bound follows by observing that each new element added 
to a sequence can 'cover' at most [^^^ new subsets X (as these new subsets 
must involve the new element). 

The upper bound follows as in the proof of Theorem [H So we define the 
hypergraph r„ to have hyperedges as before, with vertices the t-subsets of 
the n-set F, and with a vertex lying in a hyperedge b if and only if the subset 
is contained in a set of /c + 1 consecutive elements of the sequence b. The 
graph r„ is r-uniform, where r = ^(^^^) + h{k,t) for some fixed function h 
of k and t. The degree of a vertex in does not depend on the vertex and 
is of the order of n^~*, whereas the codegree of a pair of vertices depends on 
the size of intersection of the subsets the vertices are identified with, but is 
at most (9(n^~*~^). We may use Theorem [3] to obtain a small cover for F^, 
and then concatenate the resulting sequences in this cover to obtain a short 
t-subset fc-radius sequence, just as in the proof of Theorem [1] □ 

Open Problem 2. Find good explicit constructions oft-subset k-radius se- 
quences. 

This problem has been considered in the case t = k + lhj Lone, Traczyk, 
and Truszczynski [13]. The authors show that fk+i,k{'n) = (^) + 0(nl-'^/^-l), 
determine /3,2(^) exactly and determine /4,3(n) and fe^^in) for infinitely many 
values of n. 

The corresponding packing rather than covering problem is also inter- 
esting combinatorially (although we do not know of an application). Here 
we may define a packing t-subset k-radius sequence over F to be a sequence 
ai,a2, ■ ■ ■ yttm over F with the property that any t-subset X C F only occurs 
as a subset of {ai,aj+i, . . . ,ai+k} in at most one position in the sequence. 
More precisely, we require that for all t-subsets X C F there exists at most 
one choice for an increasing sequence zi, Z2, . . . , Zt of integers such that 

X = {ttzi, 0'Z2i • • • ) (^zt} 

and where \zt — Zi\ < k. 
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Open Problem 3. Define Ft^k{n) to be the length of the longest packing 
t-subset k-radius sequence. Find good asymptotic lower bounds on Ft^k{n), 
either using probabilistic or explicit constructions. 

This problem has been considered in the case when t = k + 1 hj Curtis, 
Hines, Hurlbert and Moyer [5] under the name of Ucycle packings. The 
authors prove that for any t, we have 



Their work was motivated by the concept of a universal cycle; see jH El fTU] . 
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